A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating WiMAX System Design with FPGAs

WiMAX, or the IEEE 802.16 standard for broadband wireless access, is increasingly gaining in popularity as a technology with significant market potential. This paper first provides an overview of the existing and developing 802.16 standards and their key differences/applications. The PHY and MAC layers of a typical WiMAX base station are then described. The associated implementation challenges ...

متن کامل

3D Recursive Gaussian IIR on GPUs and FPGAs A Case Study for Accelerating Bandwidth-Bounded Applications

GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 MA...

متن کامل

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow3Darch...

متن کامل

Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-ste...

متن کامل

Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design

Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the peak performance at 65W to 240W of power respectively in underlying platform for compute bound operations like Double/Single Precision General M...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2019

ISSN: 2079-9292

DOI: 10.3390/electronics8010065